SARS-CoV-2 clade dynamics and their associations with hospitalisations during the first two years of the COVID-19 pandemic

Background The COVID-19 pandemic was characterised by rapid waves of disease, carried by the emergence of new and more infectious SARS-CoV-2 virus variants. How the pandemic unfolded in various locations during its first two years has yet to be sufficiently covered. To this end, here we are looking at the circulating SARS-CoV-2 variants, their diversity, and hospitalisation rates in Estonia in the period from March 2000 to March 2022. Methods We sequenced a total of 27,550 SARS-CoV-2 samples in Estonia between March 2020 and March 2022. High-quality sequences were genotyped and assigned to Nextstrain clades and Pango lineages. We used regression analysis to determine the dynamics of lineage diversity and the probability of clade-specific hospitalisation stratified by age and sex. Results We successfully sequenced a total of 25,375 SARS-CoV-2 genomes (or 92%), identifying 19 Nextstrain clades and 199 Pango lineages. In 2020 the most prevalent clades were 20B and 20A. The various subsequent waves of infection were driven by 20I (Alpha), 21J (Delta) and Omicron clades 21K and 21L. Lineage diversity via the Shannon index was at its highest during the Delta wave. About 3% of sequenced SARS-CoV-2 samples came from hospitalised individuals. Hospitalisation increased markedly with age in the over-forties, and was negligible in the under-forties. Vaccination decreased the odds of hospitalisation in over-forties. The effect of vaccination on hospitalisation rates was strongly dependent upon age but was clade-independent. People who were infected with Omicron clades had a lower hospitalisation likelihood in age groups of forty and over than was the case with pre-Omicron clades regardless of vaccination status. Conclusions COVID-19 disease waves in Estonia were driven by the Alpha, Delta, and Omicron clades. Omicron clades were associated with a substantially lower hospitalisation probability than pre-Omicron clades. The protective effect of vaccination in reducing hospitalisation likelihood was independent of the involved clade.


Introduction
As was largely the case in other countries, Estonia experienced multiple COVID-19 waves following the emergence of the pandemic.The first wave to be caused by the SARS-CoV-2 virus occurred between February and June 2020, with a peak hospitalisation of around twelve COVID-19 patients per 100,000 people per day in April 2020, and this was accompanied by a national lockdown which lasted for nine weeks.The next waves were driven by the emergence of Alpha, Delta, and Omicron variants of concern (VOC) [1][2][3][4][5], with three peaks of hospitalisation of between fifty and sixty COVID-19 patients per 100,000 people per day in March 2021, November 2021, and February 2022 respectively (https://www.terviseamet.ee/en/coronavirus-dataset).A partial lockdown was established between March and May 2021 in response to overwhelming hospitalisation levels.The compulsory use of face masks was implemented between August 2021 and April 2022, until all restrictions were lifted in June 2022.Vaccination against SARS-CoV-2 has been available for essential healthcare and social services workers and risk groups since 27 December 2020, followed by those over the age of forty in May 2021, and the 12-15 year-old group in June 2021.VOCs have been associated with COVID-19 disease severity and transmissibility.The latter increased between Alpha, Delta, and Omicron [6], but Omicron was shown to cause less severe disease [7].There were two major peaks of hospitalisation, in Estonia in October-November 2021 and February-March 2022 respectively.Intensive care unit admissions peaked in Estonia in October-November 2021 (https://www.terviseamet.ee/en/coronavirus-dataset),suggesting that SARS-CoV-2 virus variants which were circulating in Estonia during those waves could be associated with more severe disease outcomes.We can report the results here from the SARS-CoV-2 whole genome sequencing study which was carried out in Estonia between March 2020 and March 2022.We undertook this study in order to be able to understand how distinct SARS-CoV-2 clades influenced the severity of COVID-19 and hospitalisation, and which other factors-including immunisation against SARS-CoV-2-served to contribute to this process.

Study design
The study used SARS-CoV-2 positive PCR test sample leftovers for the purpose of genotyping.The samples were collected and stored by the Estonian Health Board, Synlab Eesti OU ¨, and hospitals, at dates between 13 March 2020 and 31 March 2022 as part of the national SARS-CoV-2 PCR testing process.To be able to select samples for genotyping purposes, various sampling strategies were applied according to public health needs in Estonia.During the first wave in 2020, samples were sequenced from the first cases and local SARS-CoV-2 outbreaks.After the first wave, sequencing was carried out by using randomly-selected samples (median 336 samples per week).In addition, targeted samples were occasionally sequenced from hospitalised individuals, regardless of the reason for hospitalisation, and from those who had a history of international travel in the 14 days prior to their producing a positive PCR test for SARS-CoV-2 (Fig 1).Sample genotyping data was last accessed on 11 January 2023.

SARS-CoV-2 sequencing
During the study period two different protocols were used for sequencing.Between March 2020 and January 2021, RNA was reverse-transcribed and amplified in 2.5kb products [8].Pooled amplicons of each sample were used to prepare NGS libraries with Nextera XT and were clustered with MiSeq Reagent Kit v2 (500-cycles) on 2 x 150-cycle paired-end runs (Illumina Inc, San Diego, CA, USA).Since February 2021, an Illumina COVIDSeq Test Kit Artic

Statistical analysis
All data analyses were conducted using R v4.2.1 (2022-06-23).The study population baseline characteristics table was generated by using the gtsummary R package v1.6.3 [9].Bayesian modelling was carried out using the R libraries, rstan v2.21.7 [10] and brms v2.18.0 [11].Weakly informative priors were used to fit the models, and a minimum of 2,000 iterations, including warm-up iterations which amounted to half of all iterations, and three to four chains were used to fit the models.The work of extracting, summarising, and visualising draws from brms models was carried out with the tidybayes v3.0.2 [12] and emmeans v1.8.1-1 [13] R packages.The Shannon index (H) was calculated by using the vegan R package v2.6-2 [14] as H = -∑[(p i ) * log(p i )], where p i is the proportion of the i th species in an entire community.In terms of diversity analyses, the species was defined as a distinct Pango lineage.The evenness index (J) was calculated as J = H / log(S), where 'H' is the Shannon index and 'S' is the observed number of Pango lineages.Raw data manipulations, including importation, transformation, and summaries, were generated with the tidyverse R package v1.3.1 [15].Plots were prepared using the ggplot2 R package v3.3.6 [16].

Study population
Table 1 presents the characteristics of the study population as was stratified by the VOC waves.Overall, our study population's sex ratio and age group distribution match up well with the population of SARS-CoV-2 RT-PCR-positive people in Estonia.Binomial proportion analysis showed that the study population consisted of 1.6-2.8percentage points more females than males while, in the target population involving SARS-CoV-2 RT-PCR-positive people in Estonia, females were even more prevalent than males (4.1-4.3 percentage points).The median age of individuals (37 years of age) was similar both in the study and in the target population.Age groups between twenty and sixty years were over-represented relative to their prevalence in the entire Estonian population (S1A Fig) .Most people who presented with COVID-19 symptoms at the time of testing (96%) and about two-thirds of the people overall (66%) were infected in Estonia.About 3% of SARS-CoV-2 RT-PCR-positive people were hospitalised.During the study period the prevalence of people in the below-twenty year-old age group increased, while the prevalence of people aged between 60-79 decreased (both trends have a posterior probability of >0.999).No substantial changes were observed in the prevalence of other age groups (S1B Fig) .The sex ratio showed a substantial decrease in the proportion of males in the under-twenty group and the 20-39 age range (with a posterior probability of 0.995 and 0.999 respectively), with no changes in age groups above those ages (see S2 Fig).

The prevalence of clades and lineages during the VOC waves
Overall, 19 Nextstrain clades and 199 Pango lineages were identified during the study period.Those variants which belonged to the Delta (43%), Omicron (27%), and Alpha (23%) VOCs were the most highly-abundant of all sequenced SARS-CoV-2 genomes (Table 1 and S1  Table).During the pre-VOC period in 2020, the two most prevalent clades were 20B (51%), and 20A (42%) (Fig 3A).The Alpha VOC (20I) was first identified in Estonia on 31 December 2020 in a sample which had been collected from an individual with travel history.Additional cases of the Alpha VOC were identified in January 2021, followed by the local spread of the variant which eventually led to the Alpha wave, from W9 to W22 in 2021.Clades which belonged to the Delta VOC emerged thereafter and out-competed the Alpha VOC by W26 in 2021 (Fig 3A and 3B).During the Delta wave we detected three clades (21A, 21I, and 21J), with 21J emerging as dominant.In contrast to the Alpha and Delta waves, the Omicron VOC resulted in two consecutive waves, first with 21K (BA.1 and its sub-lineages) from W52 in 2021 to W5 in 2022, and then with 21L (BA.2 and its sub-lineages), starting in W7 in 2022 (Fig 3A and 3B).We detected a total of twenty-seven different lineages in 2020, with the three most highly-prevalent lineages being B.1.1 (28%), B.1.1.10(18%), and B.1.258(13%).The Alpha wave in 2021 was characterised by the dominance of the B.1.1.7 lineage (99%) whereas, during the Delta wave, we detected a total of eighty-two different Delta VOC lineages and sublineages, with AY.122 being predominant (43%), followed by AY.100 (10%) and AY.43 (7.3%).During the two Omicron waves up to 31 March 2022, we detected twenty-eight BA.1 and 16 BA.2sub-lineages (see S1 Table ).

The association between travel-related cases and VOC
We found that the proportion of travel-related cases was about 20% in populations which had become infected with the Alpha-, Delta VOC, or other variants, and about 40-50% in those who had become infected with Omicron or Beta VOC (S3 Fig).These cases were associated with 88 different countries.The top five countries into which VOCs had mainly been imported included the neighbouring countries around Estonia (Finland, Sweden, and Russia), and popular travel destinations (Egypt and Great Britain).The Beta VOC import was related to a very

Higher diversity amongst travel-related cases
We identified a total of 165 different Pango lineages from the period for which we had sample origin info (starting from 7 Feb 2021).A total of 57 (or 35%) of those were unique to individuals who reported that they had travelled compared to 23 (13%) lineages which were uniquely assignable to domestic infections, while 86 (52%) lineages were present in both groups.Out of those 86 lineages which were found in both groups, a total of 51 (59%) were first identified in individuals who had reported that they had travelled.The Shannon diversity index of Pango lineages decreased steeply during the Alpha and Omicron waves, something which was in contrast with the Delta wave in which diversity rebounded quickly after the Alpha wave and remained at a relatively higher level until the end of the wave (Fig 4A).Diversity displayed small peaks at the beginning of waves and was higher amongst travel-related cases when compared to domestic cases in about 68% of weeks during the Alpha, Delta, and Omicron waves (Fig 4B).The second half of the Delta wave was characterised by higher levels of diversity in domestic cases.The observed number of lineages was higher during the Delta wave and the first Omicron wave (21K, BA.1 and its sub-lineages) when compared to the Alpha wave and Neither the European Union nor the granting authority can be held responsible for them.The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests:
The authors have declared that no competing interests exist.

The association of RT-PCR Cq values with VOC
We observed that people who had become infected with the Delta VOC had lower average RT-PCR Cq values of the ORF1a, S gene, and N gene than people who had been infected with the Alpha or Omicron (16.6 versus 18.2 or 18.2 respectively) VOC (Table 1

The association between hospitalisation and clade and vaccination
We found that the probability of hospitalisation was strongly related to age and COVID-19 vaccination status (Fig 5A).Overall, the probability of hospitalisation was very low for people who were under forty years of age but started to increase after that, whereas vaccinated people had a lower probability of being hospitalised when compared to unvaccinated people.The beneficial effect of any vaccination increased considerably with age, as the odds of hospitalisation

Discussion
In this countrywide study, we characterise the circulating variants in Estonia between March 2020 and March 2022 by genotyping about 7% of SARS-CoV-2-positive samples.We demonstrate the following: 1) the prevalence of the clades in Estonia resembled those in other European countries 2) different VOC waves varied in terms of Pango lineage diversity, with the highest levels of diversity being seen in the Delta wave and the lowest in the Alpha wave 3) the probability of hospitalisation was associated with clade, as two Omicron clades (21K and 21L) displayed lower hospitalisation rates when compared to all other common clades 4) the protective effect of the vaccination against severe disease, with such disease being associated with hospitalisation, was seen in individuals who were aged forty and above, with the effect strongly increasing with age and being independent of the involved clade.

Changes in the most highly-affected population groups during the pandemic
The genotypic surveillance of SARS-CoV-2 in Estonia presents trends which generally match those of the rest of Europe, being and is characterised by the prevalence of clades 20A and 20B in 2020, and then in 2021-2022 by three successive waves of SARS-CoV-2 which were dominated by Alpha, Delta, and Omicron VOC in that order (https://www.ecdc.europa.eu/en/covid-19).As for those population subgroups which were affected, as with others [17], we observed that individuals who were aged between 20-59 years were over-represented in Estonia (>30%) relative to their proportion in the general population (26%).Again as with others [18,19], we observed an age-group prevalence shift from individuals who were aged forty and above during the first waves of the pandemic to individuals who were below forty, with the most substantial increase being seen during the Omicron waves in individuals who were aged nineteen and below.We could speculate that such shifts can be attributed to several measured and non-measured factors, such as asymptomatic infections and/or milder symptoms in testing younger people, the early vaccination of older at-risk age groups when compared to people in the younger age groups, and public health efforts to manage the pandemic, to mitigation measure enforcement, and also to comply with such measures.The introduction of antigen quick-testing in schools in November 2021 may also have substantially contributed to the measured increase in the prevalence of under-nineteens showing an increase in infections.

Lineage diversity during the waves
Lineage diversity in terms of the Alpha wave was distinct from that of all other waves, including the pre-VOC wave, and the Delta and Omicron waves, through the overwhelming dominance of one single lineage: B.1.1.7.The B.1.1.7 lineage was more highly-transmissible than its pre-VOC predecessors, resulting in its worldwide spread, and this became dominant in a large number of countries [20][21][22].Omicron waves displayed a similar pattern to Alpha within our study timeframe, albeit with a substantial decrease in diversity.Diversity was at its highest during the Delta wave and, in contrast to the Alpha and Omicron waves, remained high throughout the wave.We could speculate that higher Delta diversity was associated with higher virus loads, as Delta infections were shown to carry about six times more viral RNA when compared to Alpha [23], something which is supported by our findings for the two Delta clades which displayed the lowest Cq values.Interestingly, we observed an apparent increase in the diversity of the imported variants before or at the beginning of new waves.Variants which were present only in those individuals who reported that they had recently travelled and variants which were first detected in individuals who had reported travelling represent 70% of all variants which were circulating in Estonia during the same period, suggesting 70% efficiency in intercepting new variants entering the country.

Hospitalisation
As is the case with others [24,25], our analysis of hospitalisation data suggests that the risk of hospitalisation was lower in the case of two Omicron clades when compared to all other common clades which were detected in Estonia during the study period.There were no differences in hospitalisation rates between all the other clades.We can see slightly increased hospitalisation rates in relation to Omicron-infected young people (those under the age of twenty) when compared to other clades, with this being a trend which has been spotted by others [24].Importantly, our data suggest that vaccination substantially reduced the probability of hospitalisation, especially in older populations which were at a higher risk of contracting severe levels of illness, irrespective of the involved virus clade.The vaccination effect on hospitalisation rates was strongly related to age, and tended to wane in younger age groups.We can speculate that the increasing protective effect of vaccination on hospitalisation rates in connection with age, particularly in those who were aged forty and above, can at least partially be attributed to a higher adaptive immune response to SARS-Cov-2 [26].

The significance and implementation of SARS-CoV-2 sequencing
As in many countries, Estonia initiated SARS-CoV-2 sequencing in order to characterise the mutation patterns of the virus, and to complement the epidemiological data of outbreak management, including contact tracing, against viral genetic information.In 2021 the detection of new variants and potentially more virulent strains which were circulating around the world became the main focus of sequencing.This information was constantly integrated into decision-making by the Estonian Health Board and the government.Simultaneously, it is pivotal to acknowledge that, most of the time, we were sequencing the minimum level-or even higher-of samples as recommended by ECDC, enabling us to describe the variants almost in real-time in the background of the constantly and rapidly evolving COVID-19 pandemic.
We emphasise that, based upon an in-depth analysis of SARS-CoV-2 sequencing outcomes, it is possible to devise a sequencing strategy for future crises.For instance, the SARS-CoV-2 sequencing project made it possible for us to work out, from scratch, an effective sequencing approach at the national level, however, the downstream analysis further highlighted the significance of the proper stratification of samples in order to sufficiently cover various population groups and societal segments.Countries around the globe, including Estonia, expended substantial resources in conducting SARS-CoV-2 sequencing.Therefore, it is crucial to assess the optimal means by which such information can be acquired in the future.Our study provides valuable information for formulating areas of strategy in the event of a future pandemic which resembles the COVID-19 pandemic.Data which have been generated during this project could be integrated into modelling efforts which are aimed at enhancing levels of preparedness for future pandemics or epidemics.

Strengths and limitations
The major strength of this study includes complementing our sequencing data with hospitalisation and vaccination data which has been retrieved from our national health databases, giving us a unique opportunity to explore the associations between viral genetics and disease outcomes.Our study has several limitations, including the uneven coverage of sample metadata, with less metadata being available for those samples which were collected at the beginning of the pandemic; and time-dependent confounding from multiple factors which can be related to different sets of public health mitigation measures which were being implemented during the consecutive virus waves, along with acquired immunity from SARS-CoV-2 infections, vaccine availability, and vaccination coverage, which together may have obscured the associations between clades and hospitalisation rates.Some of the analyses were affected by small sample sizes, such as when we pooled all the various vaccines in order to assess vaccine protection levels.Additionally, due to the small sample size, we could not reliably analyse the association of SARS-CoV-2 clades with the use of intensive care units or with mortality rates in Estonia.

Conclusions
This national SARS-CoV-2 sequencing study from the first two years of the COVID-19 pandemic (March 2020 to March 2022) revealed the following: 1) disease waves in Estonia were driven by the Alpha, Delta, and Omicron clades.
2) those variants which belonged to the Omicron clade caused substantially less severe levels of illness and hospitalisation when compared to preceding variants.
3) the COVID-19 vaccination effect on hospitalisation rates was independent of the involved SARS-CoV-2 clade.
During the COVID-19 pandemic, such data helped state agencies to understand how SARS-CoV-2 spread around Estonia and how measures to tackle the pandemic could be implemented.The study also made it possible to follow the evolution of the virus during the pandemic.When the Omicron wave came into Estonia, the virus spread faster, but the disease had become significantly milder by then so strict restrictions to control the spread of the virus were no longer necessary.

Fig 1 .
Fig 1.The workflow for the study population.Epidemiological data availability means that, at least, the date of sampling and age or sex are present.The use of 'ECDC' refers to the European Centre for Disease Prevention and Control in Sweden.https://doi.org/10.1371/journal.pone.0303176.g001

Fig 2 .
Fig 2. Testing and sequencing SARS-CoV-2 in Estonia.The bars show the number of tests which were carried out (left axis) and the line denotes the percentage of positive samples sequenced (right axis).Vertical lines denote the duration of waves, defined by the prevalence of Nextstrain clade(s).The wave start and end were defined as a week in which the lower or upper bound respectively of the 95% confidence interval of clade prevalence, as obtained through the Clopper-Pearson method, crossed the 50% threshold.SARS-CoV-2 test results were downloaded from the Estonian COVID-19 open-data portal (https:// opendata.digilugu.ee;last accessed 5 March 2024).https://doi.org/10.1371/journal.pone.0303176.g002

Fig 4 .Fig 5 .
Fig 4. The diversity of SARS-CoV-2 lineages in domestic and travel-related cases in Estonia.(A) Shannon diversity index.Points denote individual weekly observations.The line denotes the autoregressive model which has been fitted to the data, and the shaded ribbon denotes a 95% credible interval, N = 14,341; (B) the effect-size of the Shannon index, imported cases compared to domestic cases.The line denotes the effect size as derived from the model fit which is shown in Panel A, and the shaded ribbon denotes a 95% credible interval.The model summary is presented in Table 3 in the S2 Appendix.https://doi.org/10.1371/journal.pone.0303176.g004 [contract number 1.1-6.2/21/298]and by the University of Tartu.This work was funded partially by the European Union through HORIZON Coordination and Support Actions [grant agreement 101079349] "Boosting the One Health Research Excellence and Management Capacity of the Estonian University of Life Sciences".Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Health and Digital Executive Agency.Neither the European Union nor the granting authority can be held responsible for them.This work was funded partially by the European Union through EU4Health Programme (EU4H) [grant agreement 101102733] "Delivering a Unified Research Alliance of Biomedical and public health Laboratories against Epidemics".Views and opinions expressed are limited set of countries and did not result in a wave in Estonia.Compared to the Beta, the Omicron VOC displayed a wider geography, one which was similar to that of the Alpha and Delta VOCs (see S3C Fig).

Table 1 . The characteristics of the study population. Total SARS-CoV-2 positive samples SARS-CoV-2 positive samples sequenced
the second Omicron wave both in travel-related and domestic cases (21L, BA.2 and its sub-lineages) (S5A Fig).For most of the weeks under consideration, the number of unique lineages was higher amongst travel-related cases (S5B Fig).The weekly number of unique lineages in travelrelated cases displayed a substantial peak relative to domestically circulating lineages at the beginning of the first Omicron wave (S5A and S5B Fig).The evenness index sharply decreased during the Alpha wave when 99% of cases belonged to the B.1.1.7 lineage (see S1 Table), but this rebounded before the Delta wave, and remained relatively stable afterwards with no single lineage becoming prevalent (S5C Fig).The evenness index was higher in imported cases in about 66% of overall weeks during the Alpha, Delta, and Omicron waves (S5D Fig).

Table 1 .
(Continued) Displayed in the table are the epidemiological and clinical characteristics of populations which have been infected with SARS-CoV-2 VOCs in Estonia between March 2020 and March 2022.Other VOCs consist of Beta and Gamma VOCs.See S1 Table for lineages which belong to strata.IQR, interquartile range; NA, not applicable; SD, standard deviation.The dimension of values given in parentheses is shown in the 'Variable' column.